Second-order stochastic variational inference
نویسندگان
چکیده
Stochastic gradient descent (SGD), the workhorse of stochastic optimization, is slow in theory (sub-linear convergence) and in practice (thousands of iterations), intuitively for two reasons: 1) Its learning rate schedule is fixed a priori and decays rapidly enough to 0 that is square-summable. This learning rate schedule limits the step size and hence the rate of convergence for a Lipschitz objective function. 2) It fails to account for the objective’s curvature. Extensions to SGD like AdaGrad [7] and Adam [8] adjust for the relative scales of the parameters but not the curvature in general.
منابع مشابه
Fast Second Order Stochastic Backpropagation for Variational Inference
We propose a second-order (Hessian or Hessian-free) based optimization method for variational inference inspired by Gaussian backpropagation, and argue that quasi-Newton optimization can be developed as well. This is accomplished by generalizing the gradient computation in stochastic backpropagation via a reparametrization trick with lower complexity. As an illustrative example, we apply this a...
متن کاملFast Black-box Variational Inference through Stochastic Trust-Region Optimization
We introduce TrustVI, a fast second-order algorithm for black-box variational inference based on trust-region optimization and the “reparameterization trick.” At each iteration, TrustVI proposes and assesses a step based on minibatches of draws from the variational distribution. The algorithm provably converges to a stationary point. We implement TrustVI in the Stan framework and compare it to ...
متن کاملAdaptively Setting the Learning Rate in Stochastic Variational Inference
Stochastic variational inference is a promising method for fitting large-scale probabilistic models with hidden structures. Different from traditional stochastic learning, stochastic variational inference uses the natural gradient, which is particularly efficient for computing probabilistic distributions. One of the issues in stochastic variational inference is to set an appropriate learning ra...
متن کاملGeneralizing and Scaling up Dynamic Topic Models via Inducing Point Variational Inference
Dynamic topic models (DTMs) model the evolution of prevalent themes in literature, online media, and other forms of text over time. DTMs assume that topics change continuously over time and therefore impose continuous stochastic process priors on their model parameters. In this paper, we extend the class of tractable priors from Wiener processes to the generic class of Gaussian processes (GPs)....
متن کاملCollapsed Variational Bayesian Inference for PCFGs
This paper presents a collapsed variational Bayesian inference algorithm for PCFGs that has the advantages of two dominant Bayesian training algorithms for PCFGs, namely variational Bayesian inference and Markov chain Monte Carlo. In three kinds of experiments, we illustrate that our algorithm achieves close performance to the Hastings sampling algorithm while using an order of magnitude less t...
متن کاملAuto-Encoding Variational Bayes
How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017